RANDOM MATRICES: 
THE FOUR MOMENT THEOREM FOR WIGNER ENSEMBLES 



TERENCE TAG AND VAN VU 



Abstract. We survey some recent progress on rigorously establishing the 
universality of various spectral statistics of Wigner random matrix ensembles, 
focusing in particular on the Four Moment Theorem and its applications. 



1. Introduction 

The purpose of this paper is to survey the Four Moment Theorem and its ap- 
phcations in understanding the asymptotic spectral properties of random matrix 
ensembles of Wigner type. Due to limitations of space, this survey will be far from 
exhaustive; an extended version of this survey will appear elsewhere. (See also [14) . 
[29] ■ [44] for some recent surveys in this area.) 

To simplify the exposition (at the expense of stating the results in maximum 
generality), we shall restrict attention to a model class of random matrix ensembles, 
in which we assume somewhat more decay and identical distribution hypotheses 
than are strictly necessary for the main results. 

Definition 1 (Wigner matrices). Let n > 1 be an integer (which we view as a 
parameter going off to infinity) . An n x n Wigner Hermitian matrix M„ is defined 
to be a random Hermitian n x n matrix M„ — {S.ij)i<ij<m hi which the ^ij for 
1 < * < j < are jointly independent with = (in particular, the S^u are 
real- valued) . For I < i < j < n, we require that the have mean zero and 
variance one, while for 1 < i — j < n we require that the ^.^ have mean zero 
and variance for some ct^ > independent of i,j,n. To simplify some of the 
statements of the results here, we will also assume that the = ^ are identically 
distributed for i < j, and the = ^' are also identically distributed for i = j, and 
furthermore that the real and imaginary parts of ^ are independent. We refer to 
the distributions Ref , Im^, and ^' as the atom distributions of Af„. 

We say that the Wigner matrix ensemble obeys Condition CO if we have the 
exponential decay condition 
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for all 1 < «, j < n and t > C , and some constants C, C (independent of z, j, n). 

We refer to the matrix Wn :— -^Mn as the coarse-scale normalised Wigner Her- 
mitian matrix, and A„ '■— y/nMn as the fine- scale normalised Wigner Hermitian 
matrix. 

Example 2 (Invariant ensembles). An important special case of a Wigner Hermit- 
ian matrix M„ is the gaussian unitary ensemble (GUE), in which = N{0, l)c are 
complex gaussians with mean zero and variance one for i ^ j, and ^j, = A'^(0, 1)r 
are real gaussians with mean zero and variance one for 1 < z < n (thus cr^ = 1 
in this case). Another important special case is the gaussian orthogonal ensemble 
(GOE), in which = N(0, 1)r are real gaussians with mean zero and variance 
one for i ^ j, and = iV(0, 2)r are real gaussians with mean zero and variance 
2 for 1 < i < n (thus cr-^ = 2 in this case). These ensembles obey Condition 
CO. These ensembles are invariant with respect to conjugation by unitary and 
orthogonal matrices respectively. 

Given an n X n Hermitian matrix A, wc will denote its n eigenvalues in increasing 
order as 

XiiA) <...<XniA), 
and write X{A) := {Xi{A), A„(A)). We also let ui{A), Un{A) e C" be an 
orthonormal basis of eigenvectors of A with Aui{A) = Xi{A)ui{A). 

We also introduce the eigenvalue counting function 
(1) NiiA) |{1 < i < n : A,(A) e /}| 

for any interval / C M. Wc will be interested in both the coarse-scale eigenvalue 
counting function Ni{Wn) and the fine-scale eigenvalue counting function Nj{An). 



2. The local semi-circular law 



The most fundamental result about the spectrum of Wigner matrices is the Wigner 
semi-circular law. We state here a powerful local version of this law, due to Erdos, 
Schlein, and Yau [20l[2Tl[22] (see also [25], [26], [15], [H] for further refinements). 
Denote by psc the semi-circle density function with support on [—2, 2], 

(2) pUx):^^{A-x^)'f. 

Zn 

Theorem 3 (Local semi-circle law). Let Mn be a Wigner matrix obeying Condition 
CO, let £ > 0, and let I CZ M. be an interval of length \I\ > n^^^'^ . Then with 
overwhelming probabilit-^, one ha^ 

(3) Ni{Wn)^n j^p,,{x) dx + o{n\I\). 

^By this, we mean that the event occurs with probability 1 — OaC""^) for each A> 0. 

■^We use the asymptotic notation o{X) to denote any quantity that goes to zero as n — ^ oo 
when divided by X, and 0{X) to denote any quantity bounded in magnitude by CX, where C is 
a constant independent of n. 
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Proof. See e.g. j47l Theorem 1.10]. For the most precise estimates currently known 
of this type (and with the weakest decay hypotheses on the entries), see |T5]. The 
proofs are based on the Stieltjes transform method; see e.g. [1] for an exposition of 
this method. □ 



A variant of Theorem [31 estabhshed subsequentljH in [26j, is the extremely useful 

eigenvalue rigidity property 

(4) X,{Wn) = XfiWn) + 0,{n-^+'), 

valid with overwhelming probability in the bulk range Sn < i < {1 — d)n for any 
fixed i5 > (and assuming Condition CO). This result is key in some of the strongest 
applications of the theory. Here the classical location Xf{Wn) of the i^^ eigenvalue 
is the element of [—2, 2] defined by the formula 



Psc(y) dy ^ -. 
2 n 



Roughly speaking, results such as Theorem |3] and (|H) control the spectrum of Wn 
at scales n~^^^ and above. However, they break down at the fine scale n~^; indeed, 
for intervals / of length |/| = 0{l/n), one has n Jj Psc{x) dx = 0(1), while Ni{Wn) 
is clearly a natural number, so that one can no longer expect an asymptotic of the 
form (|3]). Nevertheless, local semicircle laws are an essential part of the fine-scale 
theory. One particularly useful consequence of these laws is that of eigenvector 
delocalisation: 

Corollary 4 (Eigenvalue delocalisation). Let M„ be a Wigner matrix obeying Con- 
dition CO, and let e > 0. Then with overwhelming probability, one has Ui{Wn)*ej — 
0{n~^^'^~^^) for all I < i,j < n, where the ei, . . . , e„ are the standard basis of C". 



Note from Pythagoras' theorem that J2^=i |ui(M„)*ejp — |jui(M„)|p = 1; thus 
Corollary |4] asserts, roughly speaking, that the coefficients of each eigenvector are 
as spread out (or delocalised) as possible. 

Corollary H] can be established in a number of ways. One particularly slick ap- 
proach proceeds via control of the resolvent (or Green's function) {Wn — 
taking advantage of the identity 

n 

for z — E+^y—lr]] it turns out that the machinery used to prove Theorem[3]also can 
be used to control the resolvent. See for instance [M] for details of this approach. 



The result in 26 actually proves a more precise result that also gives sharp results in the 
edge of the spectrum, though due to the sparser nature of the A^'(W^„) in that case, the error 
term Oir(n~^~^'^) must be enlarged. 
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3. GUE AND GAUSS DIVISIBLE ENSEMBLES 



We now turn to the question of the fine-scale behavior of eigenvalues of Wigner 
matrices, starting with the model case of GUE. Here, it is convenient to work with 
the fine-scale normalisation An v^M„. For simplicity we will restrict attention 
to the bulk region of the spectrum, which in the fine-scale normalisation corresponds 
to eigenvalues Xi{An) of An that are near nu for some fixed — 2 < u < 2 independent 
of n. 

A basic object of study are the k-point correlation functions Biif' — B^h^ {An) ■ 
M.^ — >■ R+, defined via duality to be the unique symmetric function (or measure) 
for which one has 
(5) 

/ F{xi,...,Xk)Rl^\xi,...,Xk)dxi...dxk=kl V EF(Aji (A„), . . . , A,, (A„)) 

for all symmetric continuous compactly supported functions F : M'^ — > M. Alterna- 
tively, one can write 

n! f 

R^nH^l^-'-^^k) = '-JT: / Pnixi,...,Xn) dXk+l-.-dXn 



{n-ky. 



where pn -^Rn is the symmetrized joint probability distribution of all n eigen- 
values of An ■ 

From the semi-circular law, we expect that at the energy level nu for some —2 < 
u < 2, the eigenvalues of An wiU be spaced with average spacing l/psc{u). It is thus 
natural to consider the normalised k-point correlation function pii)a — pii)a{An) '■ 
M*^ — > defined by the formula 

(6) p(^i(a;i,...,Xfe) := i?(fM nu+ ' 



Psc{u) 



It has been generally believed (and in many cases explicitly conjectured; see e.g. 

page 9]) that the asymptotic statistics for the quantities mentioned above are 
universal^ in the sense that the limiting laws do not depend on the distribution of 
the atom variables (assuming of course that they have been normalised as stated in 
Definition [1} . This phenomenon was motivated by examples of similarly universal 
laws in physics, such as the laws of thermodynamics or of critical percolation; see 
e.g. [Sni Uni [TT] for further discussion. 

It is clear that if one is able to prove the universality of a limiting law, then it suf- 
fices to compute this law for one specific model in order to describe the asymptotic 
behaviour for all other models. A natural choice for the specific model is GUE, as 
for this model, many limiting laws can be computed directly thanks to the avail- 
ability of an explicit formula for the joint distribution of the eigenvalues, as well 
as the useful identities of determinantal processes. For instance, one has Ginibre's 
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formula 

(7) p„(xi,...,x„) = y^^e-NV- n 1-^--.^ 

^ ' l<i<j<n 

for the joint eigenvalue distribution, as can be verified from a standard calculation; 
see |28l . From this formula, the theory of determinantal processes, and asymptotics 
for Hermite polynomials, one can then obtain the limiting law 

(8) IhTi pl^l{xi,...,Xk) = p'iil^ixi,...,Xk) 
locally uniformly in xi, . . . , where 

and Kgino is the Dyson sine kernel 

sm{TT{x - y)) 

Ksinc{x,y) := r — 

■k{x - y) 

(with the usual convention that equals 1 at the origin) ; see [281 132] • 

Using a general central limit theorem for determinantal processes due to Costin- 
Leibowitz |7j and Soshnikov [46], one can then give a limiting law for iV/(yl„) in 
the case of the macroscopic intervals / = [nu, +oo). More precisely, one has the 
central limit theorem 



h log n 



-^^(0,l)i 



27r' 



in the sense of probability distributions, for any — 2 < m < 2; see [30] . By using 
the counting functions A^[n„,+oo) to solve for the location of individual eigenvalues 
Ai(A„), one can then conclude the central limit theorem 

(9) , — ->A/(0,1)r 

Vlogn/27r//9sc(w) 

whenever \f{An) ■= nXf{Wn) is equal to n{u + o(l)) for some fixed — 2 < w < 2; 
see [SO]- 

The above analysis extends to many other classes of invariant ensembles (such as 
GOE), for which the joint eigenvalue distribution has a form similar to (O; see [10] 
for further discussion. Another important extension of the above results is to the 
gauss divisible ensembles, which are Wigner matrices Mt of the form 

M* = e-*/2M° + (1 - e-*)i/^G„, 

where G„ is a GUE matrix independent of M^. In particular, the random matrix 
is distributed as ior t = and then continuously deforms towards the GUE 
distribution as i — ?► +oo. By using explicit formulae for the eigenvalue distribution 
of a gauss divisible matrix, Johansson [31' was abl€0 to extend the asymptotic ([8]) 
for the fc-point correlation function from GUE to the more general class of gauss 
divisible matrices with fixed parameter t > (independent of n). 



^Some additional technical hypotheses were assumed in 1311 , namely that the diagonal variance 
(T^ was equal to 1, that the real and imaginary parts of each entry of AI^ were independent, and 
that the matrix entries had bounded Cq^ moment for some Co > 6. 
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It is of interest to extend this analysis to as small a value of t as possible, since if 
one could set t = then one would obtain universality for all Wigner ensembles. By 
optimising Johansson's method (and taking advantage of the local semi-circle law), 
Erdos, Peche, Ramirez, Schlein, and Yau Jjj was able to extend the universality of 
dH) (interpreted in a suitably weak convergence topology, such as vague convergence) 
to gauss divisible ensembles for t as small as for any fixed e > 0. 

An important alternate approach to these results was developed by Erdos, Ramirez, 
Schlein, Yau, and Yin [T7], [53], [23], based on a stability analysis of the Dyson 
Brownian motion |13) governing the evolution of the eigenvalues of a matrix Ornstein- 
Uhlenbeck process. We refer to [M] for a discussion of this method. Among other 
things, this argument reproves a weaker version of the result in [18 mentioned ear- 
lier, in which one obtained universality for the asymptotic ([8]) after an additional 
averaging in the energy parameter u. However, the method was simpler and more 
flexible than that in jlSj . as it did not rely on explicit identities, and has since been 
extended to many other types of ensembles, including the real symmetric analogue 
of gauss divisible ensembles in which the role of GUE is replaced instead by GOE. 

4. The Four Moment Theorem 

The results discussed above for invariant or gauss divisible ensembles can be ex- 
tended to more general Wigner ensembles via a powerful swapping method known 
as the Lindeberg exchange strategy, introduced in Lindeberg's classic proof |36] of 
the central limit theorem, and first applied to Wigner ensembles in [8] . This method 
can be used to control expressions such as EF(M„) — F{M^), where Mn,M^ are 
two (independent) Wigner matrices. If one can obtain bounds such as 

EF(M„) ~ EFiMn) = o{l/n) 

when M„ is formed from M„ by replacing one of the diagonal entries of Af„ by 
the corresponding entry of M^, and bounds such as 

EF(M„) - E^^(M„) = o(l/n2) 

when Mn is formed from Mn by replacing one of the off-diagonal entries of M„ 
with the corresponding entry of (and also replacing ^ji — S^ij with ^ — , 
to preserve the Hermitian property), then on summing an appropriate telescop- 
ing series, one would be able to conclude asymptotic agreement of the statistics 
EF(M„) and EF(Af;): 

(10) EF(M„) - Ei^(A/;) = o(l) 

The Four Moment Theorem asserts, roughly speaking, that we can obtain conclu- 
sions of the form (|10p for suitable statistics F as long as A/„, match to fourth 
order. More precisely, we have 

^Technically, the matrices M„ formed by such a swapping procedure are not Wigner matrices 
as defined in Definition[T] because the diagonal or upper-triangular entries are no longer identically 
distributed. However, all of the relevant estimates for Wigner matrices can be extended to the 
non-identically-distributed case at the cost of making the notation slightly more complicated. As 
this is a relatively minor issue, we will not discuss it further here. 
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Definition 5 (Matching moments). Let fc > 1. Two complex random variables 
are said to match to order k if one has ERe(^)''lm(^)^ = ERe(^')"lm(^')^ 
whenever a,b > are integers such that a + b < k. 

In the model case when the real and imaginary parts of f or of ^' are independent, 
the matching moment condition simplifies to the assertion that ERe(^)'' — ERe(^')'' 
and Elm(C)^ = EIm(^')'' for all < a, & < k. 

Theorem 6 (Four Moment Theorem). Let cq > be a sufficiently small constant. 
Let Mn — {£,ij)i<i,j<n o.nd = (Cij)i<i,i<n be two Wigner matrices obeying 
Condition CO. Assume furthermore that for any 1 < i < j < n, and ^[^ match 
to order 4 and for any 1 < i < n, and ^^'^ match to order 2. Set An '■— ^/nMn 
and A'^ :— y^M/j, let 1 < k < n'^° be an integer, and let G : MJ' be a smooth 

function obeying the derivative bounds 

(11) |V^G(a;)| < n''° 

for all < j < 5 and x € M'^. Then for any 1 < «i < ^2 ' ' ' < *fc ^ o.'^d for n 
sufficiently large we have 

(12) |E(G(A,, (A„), . . . , A,, (^„))) - E(G(A,, {A'J, . . . , A,, {A'J))\ < n-'\ 

A preliminary version of Theorem [5] was first established by the authors in |50) , 
in the caseH of bulk eigenvalues (thus 5n < ii, . . . ,ik < (1 — 5)n for some absolute 
constant 5 > 0). In [47], the restriction to the bulk was removed; and in |51) . 
Condition CO was relaxed to a finite moment condition. We will discuss the proof 
of this theorem in Section [S] There is strong evidence that the condition of four 
matching moments is necessary to obtain the conclusion p^ : see [15] . 

A key technical result used in the proof of the Four Moment Theorem, which is 
also of independent interest, is the gap theorem: 

Tiieorem 7 (Gap theorem). Let Mn be a Wigner matrix obeying Condition CO. 
Then for every cq > there exists a ci > ( depending only on cq ) such that 

P(|A,+i(A„) - X,{An)\ < n~^«) « n-^i 

for all 1 < i < n. 

For reasons of space we will not discuss the proof of this theorem here, but refer the 
reader to [50], [51]. Among other things, the gap theorem tells us that eigenvalues of 
a Wigner matrix are usually simple. Closely related level repulsion estimates were 
established (under an additional smoothness hypothesis on the atom distributions) 
in [22]- 

Another variant of the Four Moment Theorem was subsequently introduced in 
[25j . in which the eigenvalues Xi- (An) appearing in Theorem [6] were replaced by the 
components of the resolvent (or Green's function) (Wn — z)~^, but with slightly dif- 
ferent technical hypotheses on the matrices M„, M^; see [25l for full details. As the 



In the paper, k was held fixed, but an inspection of the argument reveals that it extends 
without difficulty to the case when k is as large as n'^o ^ for cq small enough. 
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resolvent-based quantities are averaged statistics that sum over many eigenvalues, 
they are far less sensitive to the eigenvalue repulsion phenomenon than the individ- 
ual eigenvalues, and as such the version of the Four Moment Theorem for Green's 
function has a somewhat simpler proof (based on resolvent expansions rather than 
the Hadamard variation formulae and Taylor expansion). Conversely, though, to 
use the Four Moment Theorem for Green's function to control individual eigenval- 
ues, while possible, requires a significant amount of additional argument; see |34j . 
Finally, we remark that the Four Moment Theorem has also been extended to cover 
eigenvectors as well as eigenvalues; see [15], [31] for details. 



5. Sketch of proof of four moment theorem 



In this section we discuss the proof of Theorem |6l following the arguments that 
originated in |50j and refined in [51) . 

In addition to Theorem [T] a key ingredient is the following truncated version of 
the Four Moment Theorem, in which one removes the event that two consecutive 
eigenvalues are too close to each other. For technical reasons, we need to introduce 
quantities 

1 



for i — 1, . . . , n, which is a regularised measure of extent to which Xi{An) is close 
to any other eigenvalue of An- 

Theorem 8 (Truncated Four Moment Theorem) . Let cq > be a sufficiently small 
constant. Let Mn = (Cy)i<i,j<n o-nd M^^ — {£,'ij)i<i.j<n be two Wigner matrices 
obeying Condition CO. Assume furthermore that for any 1 < i < j < n, ^ij and 
(^[^ match to order 4 and for any 1 < i < n, S^u and ^.'a match to order 2. Set 
An y/nMn and A'^^ := y/nM^^, let 1 < k < n'^° be an integer, and let 

G = G(Aij , . . . , Xi^, Qi-^ , . . . , Qif. ) 

be a smooth function from M*^ x K'^ to M that is supported in the region 

(13) Q,i,...,Qz, <n^« 
and obeys the derivative bounds 

(14) |V^G(A,,,...,A,,,Q,,,...,Q,J| <n^° 
for all < j < 5. Then 

EG(Ai, (An), . . . , Ai, {An),Q^, (An), Q,, (An)) = 

^^^^ EG(A., [A'J, . . . , A,, «), g,, «), . . . , Q,, «)) + 0(n-i/2+o(-o). 



We will discuss the proof of this theorem shortly. Using Theorem [T] one can then 
deduce Theorem [S] from Theorem [H] by smoothly truncating in the Q variables: see 
[501 §3.3]. 

It remains to establish Theorem |5| To simplify the exposition slightly, let us 
assume that the matrices M„,M^ are real symmetric rather than Hermitian. 
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As indicated in SectionHJ the basic idea is to use the Lindeberg exchange strategy. 
To iUustratc the idea, let Mn be the matrix formed from Af„ by replacing a single 
entry S^pq of M„ with the corresponding entry of for some p < q, with a 
similar swap also being performed at the entry to keep M„ Hermitian. Strictly 
speaking, M„ is not a Wigner matrix as defined in Definition [TJ as the entries are no 
longer identically distributed, but this will not significantly affect the arguments. 
(One also needs to perform swaps on the diagonal, but this can be handled in 
essentially the same manner.) 

Set An :— y/riMn as usual. We will sketch the proof of the claim that 
EG(A,, (A„), . . . , A,, (An), Qi, (An), . . . , Q,, (A„)) 

= EGiK, (i„), . . . , A,, (i„), Q^AAn),.■., Q^, (i„)) + 0(7,-^/^+''^'°^ ; 

by telescoping together O(n^) estimates of this sort one can establish (IT5t . (For 
swaps on the diagonal, one only needs an error term of 0(n~^/^+'-'^^°^), since there 
are only 0{n) swaps to be made here rather than 0{n^). This is ultimately why 
there are two fewer moment conditions on the diagonal than off it.) 

We can write A„ = ^(Cpg), A„ = A(^^^), where 

A{t) = A{0) + tA' (0) 

is a (random) Hermitian matrix depending linearljQ on a real parameter t, with 
A{0) being a Wigner matrix with one entry (and its adjoint) zeroed out, and A'{0) 
is the explicit elementary Hermitian matrix 

(16) A'(0) = epe; + e;e,. 

We note the crucial fact that the random matrix A{0) is independent of both 
and ^pq. Note from Condition CO that we expect ^pq,Cpq to have size 0{n'~^^'^°^) 
most of the time, so we should (heuristically at least) be able to restrict attention 
to the regime t = 0{n'-"-'^°'>). If we then set 

(17) F{t) EG(A., {A{t)), . . . , A,, {A{t)), Q., (A(i)), . • . , Q., (Ait))) 
then our task is to show that 

(18) EFi^pq) = EFi^pq) + 0(n-5/2+o(co))_ 
Suppose that we have Taylor expansions of the form 

4 

(19) A., {A{t)) ^ K {Am + E c'..^' + Oin-'/'+^(^o)) 



we were working with Hermitian matrices rather than real symmetric matrices, then one 
could either swap the real and imaginary parts of the ^ij separately (exploiting the hypotheses 
that these parts were independent), or else repeat the above analysis with t now being a complex 
parameter (or equivalently, two real parameters) rather than a real one. In the latter case, one 
needs to replace all instances of single variable calculus below (such as Taylor expansion) with 
double variable calculus, but aside from notational difficulties, it is a routine matter to perform 
this modification. 
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for all t = 0{n'~'^'^°'^) and I — 1, . . . , fc, where the Taylor coefficients ci.j have size 
Qj = 0(n^-'/^+*^'-'^''\ and similarly for the quantities Qi, Then by using the 

hypothesis (|14p and further Taylor expansion, we can obtain a Taylor expansion 

4 

F{t) = F(0) + ^ fjt' + 0(n-5/2+o(co)) 
i=i 

for the function F{t) defined in p7|) . where the Taylor coefficients /j have size 
fj — 0(n^^/^+'^(°o))_ Setting t equal to and taking expectations, and noting 
that the Taylor coefficients fj depend only on F and A{Q) and is thus independent 
of , we conclude that 

4 

EF(Cp,) = EF(0) + 5](E/,)(Ee^,) + 0(n-5/2+o(co)) 
i=i 

and similarly for Ei^(^p^). If and have matching moments to fourth order, 
this gives pE)) . 

It remains to establish ([T5)) (as well as the analogue for Q^, which turns 

out to be analogous). We abbreviate ii simply as i. By Taylor's theorem with 
remainder, it would suffice to show that 

(20) - 0(n-^/^+o(-)) 

for J = 1, . . . , 5. As it turns out, this is not quite true as stated, but it becomes true 
(with overwhelming probabilitjil) if one can assume that Qi{A{t)) is bounded by 
j^O(co)_ principle, one can reduce to this case due to the restriction (fT3|) on the 
support of G, although there is a technical issue because one will need to establish 
the bounds (|20|) for values of t other than or ^pq. This difficulty can be overcome 
by a continuity argument; see [50 . For the purposes of this informal discussion, we 
shall ignore this issue and simply assume that we may restrict to the case where 

(21) g,(A(t))«nO(=«). 

In particular, the eigenvalue \i{A{t)) is simple, which ensures that all quantities 
depend smoothly on t (locally, at least). 

To prove (j20p . one can use the classical Hadamard variation formulae for the 
derivatives of Xi{A{t)), which can be derived for instance by repeatedly differenti- 
ating the eigenvector equation A{t)ui{A{t)) = \i{A{t))ui{A{t)). The formula for 
the first derivative is 

^MA{t))^uMit)rA\0)uM{t)). 
dt 

But recall from eigenvalue delocalisation (Corollary |4]) that with overwhelming 
probability, all coefficients of Ui{A{t)) have size 0(n^^/^+°(^^); given the nature of 
the matrix ([16]), we can then obtain ((20)) in the j = 1 case. 

^Technically, each value of t has a different exceptional event of very small probability for which 
the estimates fail. Since there are uncountably many values of t, this could potentially cause a 
problem when applying the union bound. In practice, though, it turns out that one can restrict 
t to a discrete set, such as the multiples of n"^"", in which case the union bound can be applied 
without difficulty. See [50] for details. 
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Now consider the j — 2 case. The second derivative formula reads 
—MAit)) = -2^ \uM{t)rA'{0)u,{Am' 



dt' ^ X,{A{t))~HA{t)) ■ 

Using eigenvalue delocalisation as before, we see with overwhelming probability that 
the numerator is 0(n~^+°*^^-'). To deal with the denominator, one has to exploit 
the hypothesis (PT|) and the local semicircle law (Theorem [3]). Using these tools, 
one can conclude (j20p in the j — 2 case with overwhelming probability. 

It turns out that one can continue this process for higher values of j, although the 
formulae for the derivatives for Xi{A{t)) (and related quantities, such as Pi{A{t)) 
and Qi{A{t))) become increasingly complicated, being given by a certain recursive 
formula in j. See [SD] for details. 



6. Distribution of individual eigenvalues 



One of the simplest applications of the above machinery is to extend the central 
limit theorem ([9]) of Gustavsson [30 for eigenvalues Ai(A„) in the bulk from GUE 
to more general ensembles: 

Theorem 9. The gaussian fluctuation law © continues to hold for Wigner ma- 
trices obeying Condition CO, and whose atom distributions match that of GUE to 
second order on the diagonal and fourth order off the diagonal; thus, one has 

\Mn)~\f{A.n) 

/, 1^ I TT ^ ^(0, 1)r 

^logn/27r//9sc(w) 

whenever A^'(A„) = n{u + o(l)) for some fixed —2 < u <2. 



Proof. Let M'^ be drawn from GUE, thus by ^ one already has 



^logn/27r//9sc(w) 



A^(0,l)j 



(note that \f{An) ~ (yl^). To conclude the analogous claim for Am it suffices 
to show that 

(22) P(A,(A;) e /_) - < P(A,(A„) e /) < P(A,(A:,) e /+) + n"^" 

for all intervals / = [a, 6], and n sufficiently large, where /+ [a — n^'^"^^^,b + 



n and /_ := [a + n-'">^'^'^, b - n-''^/^"^ 



We will just prove the second inequality in (|22|) . as the first is very similar. We 
define a smooth bump function G : M — )• equal to one on /_ and vanishing 
outside of /+ . Then wc have 

P(A,(^„) e/) <EG(A,(AO) 

and 

EG(A.(4J) < P(A.«) e /) 
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On the other hand, one can choose G to obey (fTT|) . Thus by Theorem |6] we have 

|EG(A,(A„)) - EG(A,K))| < n-=° 

and the second incquahty in (1221) follows from the triangle inequality. The first 
inequality is similarly proven using a smooth function that equals f on /_ and 
vanishes outside of /. □ 

Remark 10. In [30] the asymptotic joint distribution of k distinct eigenvalues 
Aij (M„), . . . , Aij. (M„) in the bulk of a GUE matrix M„ was computed (it is a 
gaussian fc-tuple with an explicit covariance matrix). By using the above argument, 
one can extend that asymptotic for any fixed k to other Wigner matrices, so long 
as they match GUE to fourth order off the diagonal and to second order on the 
diagonal. 

If one could extend the results in [SU] to broader ensembles of matrices, such as 
gauss divisible matrices, then the above argument would allow some of the moment 
matching hypotheses to be dropped, using tools such as Lemma fT3l 

Remark 11. Recently in [T^, a moderate deviations property of the distribution of 
the eigenvalues Ai(A„) was established first for GUE, and then extended to the same 
class of matrices considered in Theorem [9] by using the Four Moment Theorem. An 
analogue of Theorem [9] for real symmetric matrices (using GOE instead of GUE) 
was established in [40] . 

There are similar results at the edge of the spectrum, though with several addi- 
tional technicalities; see [151 lig] [551 ITT] 15^ . 



7. The Wigner-Dyson-Mehta conjecture 



We now consider the extent to which the asymptotic ([5]), which asserts that the 
normalised fc-point correlation functions p^}u converge to the universal limit Pg*^^(,, 
can be extended to more general Wigner ensembles. A long-standing conjecture of 
Wigner, Dyson, and Mehta (see e.g. [S^) asserts (informally speaking) that ([8]) is 
valid for all fixed k, all Wigner matrices and all fixed energy levels — 2 < m < 2 in 
the bulk. However, to make this conjecture precise one has to specify the nature of 
convergence in ([5]). For GUE, the convergence is quite strong (in the local uniform 
sense), but one cannot expect such strong convergence in general, particularly in 
the case of discrete ensembles in which pn^u is a discrete probability distribution (i.e. 
a linear combination of Dirac masses) and thus is unable to converge uniformly or 

(k) 

pointwise to the continuous limiting distribution p^-J,^^- We will thus instead settle 
for the weaker notion of vague convergence. More precisely, we say that ([8]) holds 
in the vague sense if one has 

7...^„.,....,....„..^/n. ...t. 

for all continuous, compactly supported functions F : R'^ — M. By the Stone- 
Weierstrass theorem we may take to be a test function (i.e. smooth and compactly 
supported) without loss of generality. 
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The Wigner-Dyson-Mehta conjecture is largely resolved in the vague convergence 
category: 

Theorem 12 (Wigner-Dyson-Mehta conjecture in the vague sense). Let Mn be a 
Wigner matrix obeying Condition CO, and let —2 < it < 2 and k > 1 be fixed. Then 
(|8]) holds in the vague sense. 

This theorem, proven in [52] , builds upon a long sequence of partial results towards 
the Wigner-Dyson-Mehta conjecture [SI HZl H [SSI [H [211 [H], which we will 
summarise (in a slightly non-chronological order) below. As recalled in Section [31 
the asymptotic (jS]) for GUE (in the sense of locally uniform convergence, which is 
far stronger than vague convergence) follows as a consequence of the Gaudin-Mehta 
formula and the Plancherel-Rotach asymptotics for Hermite polynomial^. 

The next major breakthrough was by Johansson [31) . who, as discussed previously, 
established ([8]) for gauss divisible ensembles at some fixed time parameter t > 
independent of n, obtained ([8]) in the vague sense (in fact, the slightly stronger 
convergence of weak convergence was established in that paper, in which the func- 
tion F in (|23p was allowed to merely be L°° and compactly supported, rather than 
continuous and compactly supported). The main tool used in [3T] was an explicit 
determinantal formula for the correlation functions in the gauss divisible case, es- 
sentially due to Brczin and Hikami [6]. 

In Johansson's result, the time parameter t > had to be independent of n. It was 
realized by Erdos, Ramirez, Schlein, and Yau that one could obtain many further 
cases of the Wigner-Dyson-Mehta conjecture if one could extend Johansson's result 
to much shorter times t that decayed at a polynomial rate in n. This was first 
achieved (again in the context of weak convergence) for t > n~^^'^~^'^ for an arbitrary 
fixed e > in [17], and then to the essentially optimal case t > (for weak 

convergence, and (implicitly) in the local sense as well) in [18]. By combining 
this with the method of reverse heat flow discussed in Section [31 the asymptotic (jS]) 
(again in the sense of weak convergence) was established for all Wigner matrices 
whose distribution obeyed certain smoothness conditions (e.g. when k — 2 one 
needs a type condition), and also decayed exponentially. The methods used in 
[18j were an extension of those in |31j , combined with an approximation argument 
(the "method of time reversal" ) that approximated a continuous distribution by a 
gauss divisible one (with a small value of t) ; the arguments in [17j are based instead 
on an analysis of the Dyson Brownian motion. 

By combining the above observation with the moment matching lemma presented 
below, one immediately concludes Theorem [12] assuming that the off-diagonal atom 
distributions are supported on at least three points. 

Lemma 13 (Moment matching lemma). Let be a real random variable with 
mean zero, variance one, finite fourth moment, and which is supported on at least 



^Analogous results are known for much wider classes of invariant random matrix ensembles, 
see e.g. [9], I42| . [5]. However, we will not discuss these results further here, as they do not directly 
impact on the case of Wigner ensembles. 
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three points. Then there exists a gauss divisible, exponentially decaying real random 
variable ^' that matches ^ to fourth order. 



For a proof of this lemma, see [501 Lemma 28] . The requirement of support on at 
least three points is necessary; indeed, if ^ is supported in just two points a, 5, then 
E(^ — a)^(^ — 6)^ = 0, and so any other distribution that matches ^ to fourth order 
must also be supported on a, b and thus cannot be gauss divisible. 

To remove the requirement that the atom distributions be supported on at least 
three points, one can observe from the proof of the four moment theorem that one 
only needs the moments of Mn and to approximately match to fourth order 
in order to be able to transfer results on the distribution of spectra of Mn to that 
of M^. In particular, if t = rt^^+'^ for some small e > 0, then the gauss divisible 
matrix associated to Mn at time t is already close enough to matching the 
first four moments of M„ to apply (a version of) the Four Moment Theorem. The 
results of [18] give the asymptotic dS]) for M^, and the eigenvalue rigidity property 
(|4]) then allows one to transfer this property to Af„, giving Theorem [T2l 

Remark 14. The above presentation (drawn from the most recent paper [52]) is 
somewhat ahistorical, as the arguments used above emerged from a sequence of 
papers, which obtained partial results using the best technology available at the 
time. In the paper [50^ , where the first version of the Four Moment Theorem was 
introduced, the asymptotic ^ was established under the additional assumptions of 
Condition CO, and matching the GUE to fourth order; the former hypothesis was 
due to the weaker form of the four moment theorem known at the time, and the lat- 
ter was due to the fact that the eigenvalue rigidity result (|4]) was not yet established 
(and was instead deduced from the results of Gustavsson [30] combined with the 
Four Moment Theorem, thus necessitating the matching moment hypothesis). For 
related reasons, the paper in [19] (which first introduced the use of an approximate 
Four Moment Theorem) was only able to establish ([8]) after an additional averaging 
in the energy parameter u (and with Condition CO). The subsequent progress in 
[23] via heat flow methods gave an alternate approach to establishing ([8]) , but also 
required an averaging in the energy and a hypothesis that the atom distributions be 
supported on at least three points, although the latter condition was then removed 
in [26]. In a very recent paper [16], Condition CO has been relaxed to finite (4-|-e)*^ 
moment of the entries for any fixed £ > 0, though still at the cost of averaging in 
the energy parameter. Some generalisations in other directions (e.g. to covariance 
matrices, or to generalised Wigner ensembles with non-constant variances) were 
also established in [3], [5T], [H], [15], [IS], [15], [H], [S3]. 

Remark 15. While Theorem (TH is the "right" result for discrete Wigner ensem- 
bles (except for the hypothesis of Condition CO, which in view of the results in 
[16j should be relaxed significantly), one expects stronger notions of convergence 
when one has more smoothness hypotheses on the atom distribution; in particular, 
one should have local uniform convergence of the correlation functions when the 
distribution is smooth enough. Some very recent progress in this direction in the 
k — 1 case was obtained by Maltsev and Schlein [37] , [38] . 
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